MSE loss is one of the most common loss functions in all of machine learning. The MSE loss
The constant
MSE loss was used in Rumelhart, Hinton, and Williams (1986), the “original” backpropagation paper. It remains a top choice for regression problems in deep learning. (Notice that minimizing the sum of squares over a linear predictor is equivalent to computing the linear regression.)
It is not as common for classification tasks, where we are estimating an unknown probability mass function based on a (potentially inadequate) sample. In these cases, it is often preferable to incorporate an explicit measure of information captured, so cross-entropy loss is generally preferred.